9 research outputs found

    The CONTENT4ALL project

    Get PDF

    Statistical models for the analysis of short user-generated documents: author identification for conversational documents

    Get PDF
    In recent years short user-generated documents have been gaining popularity on the Internet and attention in the research communities. This kind of documents are generated by users of the various online services: platforms for instant messaging communication, for real-time status posting, for discussing and for writing reviews. Each of these services allows users to generate written texts with particular properties and which might require specific algorithms for being analysed. In this dissertation we are presenting our work which aims at analysing this kind of documents. We conducted qualitative and quantitative studies to identify the properties that might allow for characterising them. We compared the properties of these documents with the properties of standard documents employed in the literature, such as newspaper articles, and defined a set of characteristics that are distinctive of the documents generated online. We also observed two classes within the online user-generated documents: the conversational documents and those involving group discussions. We later focused on the class of conversational documents, that are short and spontaneous. We created a novel collection of real conversational documents retrieved online (e.g. Internet Relay Chat) and distributed it as part of an international competition (PAN @ CLEF'12). The competition was about author characterisation, which is one of the possible studies of authorship attribution documented in the literature. Another field of study is authorship identification, that became our main topic of research. We approached the authorship identification problem in its closed-class variant. For each problem we employed documents from the collection we released and from a collection of Twitter messages, as representative of conversational or short user-generated documents. We proved the unsuitability of standard authorship identification techniques for conversational documents and proposed novel methods capable of reaching better accuracy rates. As opposed to standard methods that worked well only for few authors, the proposed technique allowed for reaching significant results even for hundreds of users

    Overview of the Author Profiling Task at PAN 2013

    Full text link
    [EN] This overview presents the framework and results for the Author Profiling task at PAN 2013. We describe in detail the corpus and its characteristics, and the evaluation framework we used to measure the participants performance to solve the problem of identifying age and gender from anonymous texts. Finally, the approaches of the 21 participants and their results are described.The author profiling task @PAN-2013 was an activity of the WIQ-EI IRSES project (Grant No. 269180) within the FP 7 Marie Curie People Framework of the European Commission. We want to thank the Forensic Lab of the Universitat Pompeu Fabra Barcelona for sponsoring the award for the winner team. The work of the first author was partially funded by Autoritas Consulting SA and by Ministerio de Economía y Competitividad de España under grant ECOPORTUNITY IPT-2012-1220-430000. The work of the second author was in the framework the DIANA-APPLICATIONS-Finding Hidden Knowledge in Texts: Applications (TIN2012-38603-C02-01) project, and the VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems. The work of fifth author was funded in part by the Swiss National Science Foundation (SNF) project "Mining Conversational Content for Topic Modelling and Author Identification (ChatMiner)" under grant number 200021_130208.Rangel, F.; Rosso, P.; Koppel, M.; Stamatatos, E.; Inches, G. (2013). Overview of the Author Profiling Task at PAN 2013. CLEF Conference on Multilingual and Multimodal Information Access Evaluation. 352-365. http://hdl.handle.net/10251/46636S35236

    Magnetic Resonance Imaging Confirmed Olfactory Bulb Reduction in Long COVID-19: Literature Review and Case Series

    Get PDF
    An altered sense of smell and taste was recognized as one of the most characteristic symptoms of coronavirus infection disease (COVID-19). Despite most patients experiencing a complete functional resolution, there is a 21.3% prevalence of persistent alteration at 12 months after infection. To date, magnetic resonance imaging (MRI) findings in these patients have been variable and not clearly defined. We aimed to clarify radiological alterations of olfactory pathways in patients with long COVID-19 characterized by olfactory dysfunction. A comprehensive review of the English literature was performed by analyzing relevant papers about this topic. A case series was presented: all patients underwent complete otorhinolaryngology evaluation including the Sniffin’ Sticks battery test. A previous diagnosis of SARS-CoV-2 infection was confirmed by positive swabs. The MRIs were acquired using a 3.0T MR scanner with a standardized protocol for olfactory tract analysis. Images were first analysed by a dedicated neuroradiologist and subsequently reviewed and compared with the previous available MRIs. The review of the literature retrieved 25 studies; most cases of olfactory dysfunction more than 3 months after SARS-CoV-2 infection showed olfactory bulb (OB) reduction. Patients in the personal case series had asymmetry and a reduction in the volume of the OB. This evidence was strengthened by the comparison with a previous MRI, where the OBs were normal. The results preliminarily confirmed OB reduction in cases of long COVID-19 with an altered sense of smell. Further studies are needed to clarify the epidemiology, pathophysiology and prognosis

    Statistics of online user-generated short documents

    Get PDF
    User-generated short documents assume an important role in online communication due to the established utilization of social networks and real- time text messaging on the Internet. In this paper we compare the statistics of different online user-generated datasets and traditional TREC collections, investigating their similarities and dferences. Our results support the applicability of traditional techniques also to user-generated short documents albeit with proper preprocessing

    Finding Participants in a Chat: Authorship Attribution for Conversational Documents

    No full text
    In this work we study the problem of Authorship Attribution for a novel set of documents, namely online chats. Although the problem of Authorship Attribution has been extensively investigated for different document types, from books to letters and from emails to blog posts, to the best of our knowledge this is the first study of Authorship Attribution for conversational documents (IRC chat logs) using statistical models. We experimentally demonstrate the unsuitability of the classical statistical models for conversational documents and propose a novel approach which is able to achieve a high accuracy rate (up to 95%) for hundreds of authors
    corecore